Regression models for high-dimensional data with correlated errors

نویسندگان

  • Weiqi Luo
  • Paul D. Baxter
  • Charles C. Taylor
چکیده

where y is the n× 1 response vector; X is an n× p model matrix representing the predictors; and β is a p × 1 vector of coefficients to estimate. For mathematical simplicity, it is typical to set the first predictor as the intercept β0 so that the first column of X is the n×1 vector of ones. The intercept acts as a sink for the mean effect of included predictors, so one could remove the intercept term from the model by centering response and predictors. Unlike the classical conditions imposed on ε, we assume more generally that ε ∼ N(0,Σε) where Σε is a positive definite matrix. The variance-covariance matrix could be written in the form σ εΩε so that we can obtain the classical model, σ εI , as a convenient special case. The Generalised Least Squares (GLS) method focuses on the efficiency issue which fails in ordinary least squares. Efficient estimation of β in the generalised linear regression model requires knowledge of Ωε. To simplify the model, it is useful to consider cases in which Ωε is a known, symmetric, positive definite matrix. Since Ωε is a positive definite symmetric matrix, it can be decomposed into Ωε = CΛC where the parts of the decomposition C and Λ are the characteristic vectors (eigenvectors) and roots (eigenvalues) of Ωε, respectively. Now, let G = CΛ, so Ω ε = G G where G is square and nonsingular. Premultiply the model (1) by G to obtain Gy = GXβ + Gε so y∗ = X∗β + ε∗

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data

Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...

متن کامل

به‌کارگیری متغیرهای پنهان در مدل رگرسیون لجستیک برای حذف اثر هم‌خطی چندگانه در تحلیل برخی عوامل مرتبط با سرطان پستان

Background and Objectives: Logistic regression is one of the most widely used generalized linear models for analysis of the relationships between one or more explanatory variables and a categorical response. Strong correlations among explanatory variables (multicollinearity) reduce the efficiency of model to a considerable degree. In this study we used latent variables to reduce the effects of ...

متن کامل

Comparison of Ordinal Response Modeling Methods like Decision Trees, Ordinal Forest and L1 Penalized Continuation Ratio Regression in High Dimensional Data

Background: Response variables in most medical and health-related research have an ordinal nature. Conventional modeling methods assume predictor variables to be independent, and consider a large number of samples (n) compared to the number of covariates (p). Therefore, it is not possible to use conventional models for high dimensional genetic data in which p > n. The present study compared th...

متن کامل

Wavelet Threshold Estimator of Semiparametric Regression Function with Correlated Errors

Wavelet analysis is one of the useful techniques in mathematics which is used much in statistics science recently. In this paper, in addition to introduce the wavelet transformation, the wavelet threshold estimation of semiparametric regression model with correlated errors with having Gaussian distribution is determined and the convergence ratio of estimator computed. To evaluate the wavelet th...

متن کامل

Feature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach

Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...

متن کامل

A Comparison of Thin Plate and Spherical Splines with Multiple Regression

Thin plate and spherical splines are nonparametric methods suitable for spatial data analysis. Thin plate splines acquire efficient practical and high precision solutions in spatial interpolations. Two components in the model fitting is considered: spatial deviations of data and the model roughness. On the other hand, in parametric regression, the relationship between explanatory and response v...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007